-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: speedup date parsing using ciso8601 #590
base: main
Are you sure you want to change the base?
perf: speedup date parsing using ciso8601 #590
Conversation
Relying on the regex gives about the same performance, but should let the tests pass:
|
In fact, I think we can assume we know where the date fields are right? That gives us the following performance:
And when using
This is 10x faster. |
Will you include your changes on Thanks for working on this! |
Well, jupyter_client users |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! We did something similar in ipython/ipyparallel#424 where we discovered date parsing is so expensive when there's lots of messages, but we disabled it entirely, since that skips the regex match as well.
What I think would be even better is to remove date parsing entirely, but that's a backward-incompatible change. That said, I think very few things (ipyparallel may even be the only one) that really rely on dates being parsed as part of deserialization, so it should be a major change with relatively minor disruption.
What do you think about skipping extract_dates
entirely as a major revision change? I can't think of a great mechanism for gracefully deprecating this functionality. Maybe a config option on Session with a deprecation warning?
run: py.test --cov jupyter_client jupyter_client | ||
run: | | ||
py.test --cov jupyter_client jupyter_client | ||
pip install ciso8601 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be added to the existing pip install
, one step above, instead of being part of the test stage?
This is a legacy from when we supported Pythons that didn't even have json in the standard library and there were fiddly issues with str vs bytes. There's no need to keep that, as long as we use something that's producing valid utf8 JSON bytes. |
message['msg_id'] = header['msg_id'] | ||
message['msg_type'] = header['msg_type'] | ||
message['parent_header'] = extract_dates(self.unpack(msg_list[2])) | ||
message['parent_header'] = self.unpack(msg_list[2]) | ||
if 'date' in message['parent_header']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break current stable ipyparallel, which currently relies on the behavior of parsing any date-like strong, but I can deal with that. I never should have made it find and parse any valid date objects!
I suspect that parsing just the date will break exactly as many things as disabling parsing entirely, though. Which is to say: ipyparallel and probably nothing else.
Hi @maartenbreddels! We're pushing for a 7.0 release, do you want to pick this back up? |
This is a cutout of running the profiler on voila, where it handles a single comm msg:
As can be seen, the green 'parse_date' is a significant part of the cpu usage.
With this PR, is shrinks this quite a bit:
This uses ciso8601 to do the datetime parsing, which is significantly faster
Note that is not a requirements, it will only be used when installed. However, 1 test regarding timezones fails (it should fail to convert and it does not, I'm not sure how important that is)
Running it in a benchmark
Before:
After: